Easy Accurate Reading and Writing of Floating-Point Numbers
نویسنده
چکیده
Presented here are algorithms for converting between (decimal) scientific-notation and (binary) IEEE-754 double-precision floating-point numbers. These algorithms are much simpler than those previously published. The values are stable under repeated conversions between the formats. The scientific representations generated have only the minimum number of mantissa digits needed to convert back to the original binary values. Also presented is an algorithm for printing IEEE-754 double-precision floating-point numbers with a specified number of decimal digits of precision. For the specified number of digits, the decimal numbers produced are the closest possible to the binary values. Introduction Articles from Steele and White[SW90], Clinger[Cli90], and Burger and Dybvig[BD96] establish that binary floating-point numbers can be converted into and out of decimal representations without losing accuracy using a minimum number of (decimal) significant digits. The lossless algorithms from these papers all require high-precision integer calculations, although not for every conversion. In How to Read Floating-Point Numbers Accurately[Cli90] Clinger astutely observes that successive rounding operations do not have the same effect as a single rounding operation. This is the crux of the difficulty with both reading and writing floating-point numbers. But instead of constructing his algorithm to do a single rounding operation, Clinger and the other authors follow Matula[Mat68, Mat70] in doing successive integer divisions and remainders. Both Steel and White[SW90] and Clinger[Cli90] claim that the input and output problems are fundamentally different from each other because the floating-point format has a fixed precision while the decimal representation does not. Yet, bignum rounding divisions accomplish accurate conversions in both directions. The algorithms from How to Print Floating-point Numbers Accurately[SW90] and Printing floatingpoint numbers quickly and accurately[BD96] are iterative and complicated. The read and write algorithms presented here do at most 2 and 4 bignum divisions, respectively. Over the range of IEEE-754[IEE85] doubleprecision numbers, the largest intermediate bignum used by presented algorithms is 339 decimal digits (1126 bits). According to Steele and White[SW90], the largest bignum used by their algorithm is 1050 bits. These are not large for bignums, being orders of magnitude smaller than the smallest precisions which get speed benefits from FFT multiplication. 1 Digilant, 100 North Washington Street Suite 502, Boston, MA 02114. Email: [email protected]
منابع مشابه
Not a Number of Floating Point Problems
Floating-point numbers and floating-point arithmetic contain some surprising pitfalls. In particular, the widely-adopted IEEE 754 standard contains a number that is “not a number,” and thus has some surprising properties. One has to be extremely careful in writing assertions about floating point numbers, to avoid these pitfalls. This column describes the problems and how a language might elimin...
متن کاملAccurate floating-point summation: a new approach
The aim of this paper is to find an accurate and efficient algorithm for evaluating the summation of large sets of floating-point numbers. We present a new representation of the floating-point number system in which a number is represented as a linear combination of integers and the coefficients are powers of the base of the floating-point system. The approach allows to build up an accurate flo...
متن کاملAccurate and Efficient Algorithms for Floating Point Computation ∗
1 Abstract Our goal is to find accurate and efficient algorithms, when they exist, for evaluating rational expressions containing floating point numbers, and for computing matrix factorizations (like LU, the singular value decomposition (SVD) and eigenvalue decompositions) of matrices with rational expressions as entries. More precisely, accuracy means the relative error in the output must be l...
متن کاملNumerical Difficulties in Pre-University Informatics Education and Competitions
It is easy to underestimate the difficulties of using floating-point numbers in programming. This is especially the case in pre-university informatics education and competitions, where one is often led to believe that floating-point arithmetic is a good approximation of the real number system. However, most of the mathematical laws valid for real numbers break down when applied to floating-poin...
متن کاملAccurate Floating - Point Summation ∗
Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s into the set of floating-point numbers, i.e. one of the immediate floating-point neighbors of s. If the s is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e. it is very fast for mildly...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1310.8121 شماره
صفحات -
تاریخ انتشار 2013